Skip to content

fix(batch): cancel in-flight tasks when InfrahubBatch.execute() exits#1002

Open
PhillSimonds wants to merge 1 commit intodevelopfrom
phill-fix-batch-orphan-tasks
Open

fix(batch): cancel in-flight tasks when InfrahubBatch.execute() exits#1002
PhillSimonds wants to merge 1 commit intodevelopfrom
phill-fix-batch-orphan-tasks

Conversation

@PhillSimonds
Copy link
Copy Markdown
Contributor

Why

InfrahubBatch.execute() orphans in-flight tasks when one task raises with return_exceptions=False. Sibling work continues silently on the event loop; the caller has been told the batch failed and cannot retry; failed siblings surface only as Task exception was never retrieved GC warnings. Under a transient server-slowness window, this can yield build exit status 0 against a partial graph.

Closes #1001

What changed

  • infrahub_sdk/batch.py: wrapped the asyncio.as_completed iteration in try/finally that cancels any still-running task and gather(..., return_exceptions=True) to drain. Covers all generator exit paths (raise, caller break, generator close), not only the original raise path.
  • Tests:
    • test_execute_does_not_orphan_inflight_tasks_when_raising — regression guard for the orphan behavior. Asserts that sibling task side effects do not run to completion after execute() raises.
    • test_return_exceptions_yields_exceptions_indistinguishably_from_successes — pins the current return_exceptions=True contract (failures yield as (node, ExceptionInstance) in the same tuple shape as successes). Expected to change when that API is reworked; tracked separately.

How to review

Start with infrahub_sdk/batch.py. The diff is small and self-contained. The two new tests in tests/unit/sdk/test_batch.py run without HTTPX mocks (~0.4s total).

How to test

uv run pytest tests/unit/sdk/test_batch.py -v

Expected: 8 passed.

Impact & rollout

  • Backward compatibility: behavior change only on the failure path. Healthy batches behave identically. Callers that previously relied on orphaned siblings continuing in the background after their batch raised will no longer get that — that reliance was undefined behavior (and silent data loss).
  • Performance: unchanged.
  • Config/env changes: none.
  • Deployment notes: safe to deploy.

Checklist

  • Tests added/updated
  • Changelog entry added (changelog/1001.fixed.md)
  • External docs updated (n/a — internal behavior fix)
  • Internal .md docs updated (n/a)

InfrahubBatch.execute() created one asyncio.Task per BatchTask but did
not clean them up when the generator stopped iterating. With
return_exceptions=False, the first failing task caused execute() to
re-raise while siblings kept running on the event loop with no caller
awaiting them. Successful sibling work was discarded silently;
failing siblings surfaced only as "Task exception was never retrieved"
GC warnings.

Wrap the as_completed loop in try/finally that cancels any still-
running task and gather()s with return_exceptions=True. Covers all
generator exit paths: explicit raise, caller break, generator close.

Closes #1001
@PhillSimonds PhillSimonds requested a review from a team as a code owner May 8, 2026 17:41
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying infrahub-sdk-python with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2d17f96
Status: ✅  Deploy successful!
Preview URL: https://3a664adc.infrahub-sdk-python.pages.dev
Branch Preview URL: https://phill-fix-batch-orphan-tasks.infrahub-sdk-python.pages.dev

View logs

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
infrahub_sdk/batch.py 80.00% 1 Missing and 1 partial ⚠️
@@             Coverage Diff             @@
##           develop    #1002      +/-   ##
===========================================
- Coverage    81.43%   81.42%   -0.02%     
===========================================
  Files          134      134              
  Lines        11359    11352       -7     
  Branches      1703     1705       +2     
===========================================
- Hits          9250     9243       -7     
  Misses        1566     1566              
  Partials       543      543              
Flag Coverage Δ
integration-tests 41.85% <60.00%> (-0.06%) ⬇️
python-3.10 54.36% <80.00%> (-0.05%) ⬇️
python-3.11 54.38% <80.00%> (-0.03%) ⬇️
python-3.12 54.36% <80.00%> (-0.03%) ⬇️
python-3.13 54.36% <80.00%> (-0.03%) ⬇️
python-3.14 54.36% <80.00%> (-0.03%) ⬇️
python-filler-3.12 22.73% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
infrahub_sdk/batch.py 94.73% <80.00%> (+0.37%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant